Single-cell transcriptomics reveal transcriptional programs underlying male and female cell fate during Plasmodium falciparum gametocytogenesis

The Plasmodium falciparum life cycle includes obligate transition between a human and mosquito host. Gametocytes are responsible for transmission from the human to the mosquito vector where gamete fusion followed by meiosis occurs. To elucidate how male and female gametocytes differentiate in the absence of sex chromosomes, we perform FACS-based cell enrichment of a P. falciparum gametocyte reporter line followed by single-cell RNA-seq. In our analyses we define the transcriptional programs and predict candidate driver genes underlying male and female development, including genes from the ApiAP2 family of transcription factors. A motif-driven, gene regulatory network analysis indicates that AP2-G5 specifically modulates male development. Additionally, genes linked to the inner membrane complex, involved in morphological changes, are uniquely expressed in the female lineage. The transcriptional programs of male and female development detailed herein allow for further exploration of the evolution of sex in eukaryotes and provide targets for future development of transmission blocking therapies.

For all statistical analyses, confirm that the following items are present in in the figure legend, table legend, main text, or or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of of all covariates tested
A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g.means) or or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g.Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in in published literature, software must be be made available to to editors and reviewers.We We strongly encourage code deposition in in a community repository (e.g.GitHub).See the Nature Portfolio guidelines for submitting code & software for further information.

nature portfolio | reporting summary
April 2023

Data
Policy information about availability of data All manuscripts must include a data availability statement.This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy Research involving human participants, their data, or biological material Policy information about studies with human participants or human data.See also policy information about sex, gender (identity/presentation), and sexual orientation and race, ethnicity and racism.

Reporting on sex and gender
Reporting on race, ethnicity, or other socially relevant groupings

Ethics oversight
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.Please specify the socially constructed or socially relevant categorization variable(s) used in your manuscript and explain why they were used.Please note that such variables should not be used as proxies for other socially constructed/relevant variables (for example, race or ethnicity should not be used as a proxy for socioeconomic status).Provide clear definitions of the relevant terms used, how they were provided (by the participants/respondents, the researchers, or third parties), and the method(s) used to classify people into the different categories (e.g.self-report, census or administrative data, social media data, etc.) Please provide details about how you controlled for confounding variables in your analyses.
Describe the covariate-relevant population characteristics of the human research participants (e.g.age, genotypic information, past and current diagnosis and treatment categories).If you filled out the behavioural & social sciences study design questions and have nothing to add here, write "See above." Describe how participants were recruited.Outline any potential self-selection bias or other biases that may be present and how these are likely to impact results.
Identify the organization(s) that approved the study protocol.
Parasites were sorted at four consecutive time points (Days 1, 3, 5, and 7) to cover the P. falciparum gametocyte developmental period, from early to mature male and female gametocytes.However, the total sample size was calculated empirically based on the standards in the field.We used FACS-based cell sorting to specifically enrich gametocytes based on tdTomato fluorescence, followed by droplet encapsulation on 10X Genomics Chromium Controller flowcells.We sorted approximately 10,000 cells from each time point into individual tubes.The cells were then pooled evenly into a single tube with a final concentration of 40,000 cells in total from the four different timepoints and an approximated 20,000 cells were loaded onto the flowcell.To ensure the reproducibility of the gametocyte enrichment strategy, this method was performed twice in biological duplicates using the pre-defined FACS settings for all the data points collected Single-cell data generated using using 10X Chromium Single Cell 3' reagent kits (V3) was mapped to the P. falciparum reference strain 3D7 version 50 (ASM276v2).Based on our filtering criteria, 4555 single cell transcriptomes passed our cut-off for downstream analysis.The sample size is above the state of the art in the field, e.g. in comparison, Gomes et al, Nature, 2022, utilized 2076 single cell gametocyte transcriptomes for their analyses.

nature portfolio | reporting summary
April 2023

Blinding
Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies.Here, indicate whether each material, system or method listed is relevant to your study.If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.Our lab was part of the study that produced the reporter cell line used in the study.
The cell line used has previously been tested for mycoplasma, but was not specifically tested during the course of culturing for this study There are no commonly misidentified lines in this study Describe the methods by which all novel plant genotypes were produced.This includes those generated by transgenic approaches, gene editing, chemical/radiation-based mutagenesis and hybridization.For transgenic lines, describe the transformation method, the number of independent lines analyzed and the generation upon which experiments were performed.For gene-edited lines, describe the editor used, the endogenous sequence targeted for editing, the targeting guide RNA sequence (if applicable) and how the editor was applied.
Report on the source of all seed stocks or other plant material used.If applicable, state the seed stock centre and catalogue number.If plant specimens were collected from the field, describe the collection location, date and sampling procedures.
Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to assess the effect of a mutation and, where applicable, how potential secondary effects (e.g.second site T-DNA insertions, mosiacism, off-target gene editing) were examined.

Materials
data was filtered based on gene and cell counts to remove low quality cells Single cell data (including the four time points) was produced from two biological replicates, both replicates were successful and produced similar Plasmodium falciparum gametocyte-producing NF54 Peg4-tdTomato transgenic cell line described in(McLean  et al., 2019).
Raw sequencing datasets can be found in Gene expression Omnibus GEO accession number GSE226145 and The raw processed expression matrices, metadata, custom tracks (json), and cis-target motif databases are available via [https://zenodo.org/deposit/7652581].thedata can be interactively visualized at [https:// mubasher-mohammed.shinyapps.io/mohammedetal/].Use the terms sex (biological attribute) and gender (shaped by social and cultural circumstances) carefully in order to avoid confusing both terms.Indicate if findings apply to only one sex or gender; describe whether sex and gender were considered in study design; whether sex and/or gender was determined based on self-reporting or assigned and methods used.Provide in the source data disaggregated sex and gender data, where this information has been collected, and if consent has been obtained for sharing of individual-level data; provide overall numbers in this Reporting Summary.Please state if this information has not been collected.Report sex-and gender-based analyses where performed, justify reasons for lack of sex-and gender-based analysis.